This chapter deals with basic information regarding Digital Image Processing using Python 3.8.13. I hope you will like it.
import numpy as np
import matplotlib.pyplot as plt
import cv2
from IPython.display import Image as ima
import numpy as np
from PIL import Image
import matplotlib.image as mping
import imageio
import glob
%matplotlib inline
We will use the following as our sample images. We will use the ipython image function to load and display the image.
# Display 18x18 pixel image.
ima(filename='checkerboard_18x18.png')
# Display 84x84 pixel image.
ima(filename='checkerboard_84x84.jpg')
OpenCV allows reading different types of images (JPG, PNG, etc). You can load grayscale images, color images or you can also load images with Alpha channel. It uses the cv2.imread() function which has the following syntax:
retval = cv2.imread( filename[, flags] )
retval: Is the image if it is successfully loaded. Otherwise it is None. This may happen if the filename is wrong or the file is corrupt.
The function has 1 required input argument and one optional flag:
filename: This can be an absolute or relative path. This is a mandatory argument.Flags: These flags are used to read an image in a particular format (for example, grayscale/color/with alpha channel). This is an optional argument with a default value of cv2.IMREAD_COLOR or 1 which loads the image as a color image.Before we proceed with some examples, let's also have a look at some of the flags available.
Flags
cv2.IMREAD_GRAYSCALE or 0: Loads image in grayscale modecv2.IMREAD_COLOR or 1: Loads a color image. Any transparency of image will be neglected. It is the default flag.cv2.IMREAD_UNCHANGED or -1: Loads image as such including alpha channel.Imread:https://docs.opencv.org/4.5.1/d4/da8/group__imgcodecs.html#ga288b8b3da0892bd651fce07b3bbd3a56
ImreadModes: https://docs.opencv.org/4.5.1/d8/d6a/group__imgcodecs__flags.html#ga61d9b0126a3e57d9277ac48327799c80
# Read image as gray scale.
# Print the image data (pixel values), element of a 2D numpy array.
cb_img = cv2.imread("checkerboard_18x18.png",0)
# Each pixel value is 8-bits [0,255]
print(cb_img)
[[ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0]]
# Display image.
plt.imshow(cb_img)
<matplotlib.image.AxesImage at 0x283034ed100>
Even though the image was read in as a gray scale image, it won't necessarily display in gray scale when using imshow(). matplotlib uses different color maps and it's possible that the gray scale color map is not set.
# Set color map to gray scale for proper rendering.
plt.imshow(cb_img, cmap='gray')
<matplotlib.image.AxesImage at 0x283035e4040>
# Read image as gray scale.
cb_img_fuzzy = cv2.imread("checkerboard_fuzzy_18x18.jpg",0)
# print image
print(cb_img_fuzzy)
# Display image.
plt.imshow(cb_img_fuzzy,cmap='gray')
[[ 0 0 15 20 1 134 233 253 253 253 255 229 130 1 29 2 0 0] [ 0 1 5 18 0 137 232 255 254 247 255 228 129 0 24 2 0 0] [ 7 5 2 28 2 139 230 254 255 249 255 226 128 0 27 3 2 2] [ 25 27 28 38 0 129 236 255 253 249 251 227 129 0 36 27 27 27] [ 2 0 0 4 2 130 239 254 254 254 255 230 126 0 4 2 0 0] [132 129 131 124 121 163 211 226 227 225 226 203 164 125 125 129 131 131] [234 227 230 229 232 205 151 115 125 124 117 156 205 232 229 225 228 228] [254 255 255 251 255 222 102 1 0 0 0 120 225 255 254 255 255 255] [254 255 254 255 253 225 104 0 50 46 0 120 233 254 247 253 251 253] [252 250 250 253 254 223 105 2 45 50 0 127 223 255 251 255 251 253] [254 255 255 252 255 226 104 0 1 1 0 120 229 255 255 254 255 255] [233 235 231 233 234 207 142 106 108 102 108 146 207 235 237 232 231 231] [132 132 131 132 130 175 207 223 224 224 224 210 165 134 130 136 134 134] [ 1 1 3 0 0 129 238 255 254 252 255 233 126 0 0 0 0 0] [ 20 19 30 40 5 130 236 253 252 249 255 224 129 0 39 23 21 21] [ 12 6 7 27 0 131 234 255 254 250 254 230 123 1 28 5 10 10] [ 0 0 9 22 1 133 233 255 253 253 254 230 129 1 26 2 0 0] [ 0 0 9 22 1 132 233 255 253 253 254 230 129 1 26 2 0 0]]
<matplotlib.image.AxesImage at 0x283036656d0>
Until now, we have been using gray scale images in our discussion. Let us now discuss color images.
# Read and display Coca-Cola logo.
ima("coca-cola-logo.png")
Let us read a color image and check the parameters. Note the image dimension.
# Read in image
coke_img = cv2.imread("coca-cola-logo.png",1)
# print the size of the image
print("Image size is ", coke_img.shape)
# print data-type of image
print("Data type of image is ", coke_img.dtype)
print("")
Image size is (700, 700, 3) Data type of image is uint8
plt.imshow(coke_img)
# What happened?
<matplotlib.image.AxesImage at 0x283036e21c0>
The color displayed above is different from the actual image. This is because matplotlib expects the image in RGB format whereas OpenCV stores images in BGR format. Thus, for correct display, we need to reverse the channels of the image. We will discuss about the channels in the sections below.
coke_img_channels_reversed = coke_img[:, :, ::-1]
plt.imshow(coke_img_channels_reversed)
<matplotlib.image.AxesImage at 0x28303741160>
cb_img = cv2.imread("checkerboard_color.png")
coke_img = cv2.imread("coca-cola-logo.png")
# Use matplotlib imshow()
plt.imshow(cb_img)
plt.title("matplotlib imshow")
plt.show()
# Use OpenCV imshow(), display for 8 sec
window1 = cv2.namedWindow("w1")
cv2.imshow(window1,cb_img)
cv2.waitKey(8000)
cv2.destroyWindow(window1)
cv2.destroyAllWindows()
# Use OpenCV imshow(), display for 8 sec
window2 = cv2.namedWindow("w2")
cv2.imshow(window2,coke_img)
cv2.waitKey(8000)
cv2.destroyWindow(window2)
cv2.destroyAllWindows()
# Use OpenCV imshow(), display until any key is pressed
window3 = cv2.namedWindow("w3")
cv2.imshow(window3,cb_img)
cv2.waitKey(0)
cv2.destroyWindow(window3)
cv2.destroyAllWindows()
# Use OpenCV imshow(), display until 'q' key is pressed
window4 = cv2.namedWindow("w4")
Alive = True
while Alive:
cv2.imshow(window4,coke_img)
keypress = cv2.waitKey(1)
if keypress == ord('q'):
Alive = False
cv2.destroyWindow(window4)
cv2.destroyAllWindows()
stop=1
im = Image.open("IMG_4781.JPG") # provide the correct path
print(im.width, im.height, im.mode, im.format, type(im))
3024 3024 RGB JPEG <class 'PIL.JpegImagePlugin.JpegImageFile'>
im.show() # display the image
im2 = imageio.imread("New_Zealand_Lake_SAVED.PNG")
print(type(im2), im2.shape, im2.dtype)
<class 'imageio.core.util.Array'> (600, 840, 3) uint8
plt.imshow(im2), plt.axis('off'), plt.show()
(<matplotlib.image.AxesImage at 0x283045b3460>, (-0.5, 839.5, 599.5, -0.5), None)
# read the image from disk as a numpy ndarray
im2 = mping.imread("New_Zealand_Lake_SAVED.PNG")
print(im2.shape, im2.dtype, type(im2)) ## 3 channels (RGB image)
(600, 840, 3) float32 <class 'numpy.ndarray'>
plt.figure(figsize=(10,10))
plt.imshow(im2) # display the image
plt.axis('off')
plt.show()
images = [cv2.imread(file) for file in glob.glob("*.PNG")]
plt.imshow(images[0])
<matplotlib.image.AxesImage at 0x28305e957f0>
plt.imshow(images[1])
<matplotlib.image.AxesImage at 0x2830453fe50>
ext = ['JPG', 'jpg', 'PNG'] # Add image formats here
files = []
[files.extend(glob.glob('*.' + e)) for e in ext]
images2 = [cv2.imread(file) for file in files]
plt.imshow(images2[0])
<matplotlib.image.AxesImage at 0x28303797760>
plt.imshow(images2[1])
<matplotlib.image.AxesImage at 0x28305f0f400>
plt.imshow(images2[2])
<matplotlib.image.AxesImage at 0x283058f8460>
cv2.split() Divides a multi-channel array into several single-channel arrays.
cv2.merge() Merges several arrays to make a single multi-channel array. All the input matrices must have the same size.
https://docs.opencv.org/4.5.1/d2/de8/group__core__array.html#ga0547c7fed86152d7e9d0096029c8518a
# Split the image into the B,G,R components
img_NZ_bgr = cv2.imread("New_Zealand_Lake.jpg",cv2.IMREAD_COLOR)
b, g, r = cv2.split(img_NZ_bgr)
# Show the channels
plt.figure(figsize=[20,5])
plt.subplot(141)
plt.imshow(r,cmap='gray')
plt.title("Red Channel")
plt.subplot(142)
plt.imshow(g,cmap='gray')
plt.title("Green Channel")
plt.subplot(143)
plt.imshow(b,cmap='gray')
plt.title("Blue Channel")
# Merge the individual channels into a BGR image
imgMerged = cv2.merge((b,g,r))
# Show the merged output
plt.subplot(144)
plt.imshow(imgMerged[:,:,::-1])
plt.title("Merged Output")
Text(0.5, 1.0, 'Merged Output')
cv2.cvtColor() Converts an image from one color space to another. The function converts an input image from one color space to another. In case of a transformation to-from RGB color space, the order of the channels should be specified explicitly (RGB or BGR). Note that the default color format in OpenCV is often referred to as RGB but it is actually BGR (the bytes are reversed). So the first byte in a standard (24-bit) color image will be an 8-bit Blue component, the second byte will be Green, and the third byte will be Red. The fourth, fifth, and sixth bytes would then be the second pixel (Blue, then Green, then Red), and so on.
dst = cv2.cvtColor( src, code )
dst: Is the output image of the same size and depth as src.
The function has 2 required arguments:
src input image: 8-bit unsigned, 16-bit unsigned ( CV_16UC... ), or single-precision floating-point.code color space conversion code (see ColorConversionCodes). cv2.cvtColor: https://docs.opencv.org/3.4/d8/d01/group__imgproc__color__conversions.html#ga397ae87e1288a81d2363b61574eb8cab
ColorConversionCodes: https://docs.opencv.org/4.5.1/d8/d01/group__imgproc__color__conversions.html#ga4e0972be5de079fed4e3a10e24ef5ef0
# OpenCV stores color channels in a differnet order than most other applications (BGR vs RGB).
img_NZ_rgb = cv2.cvtColor(img_NZ_bgr,cv2.COLOR_BGR2RGB)
plt.imshow(img_NZ_rgb)
<matplotlib.image.AxesImage at 0x28305fdba60>
img_hsv = cv2.cvtColor(img_NZ_bgr, cv2.COLOR_BGR2HSV)
# Split the image into the B,G,R components
h,s,v = cv2.split(img_hsv)
# Show the channels
plt.figure(figsize=[20,5])
plt.subplot(141)
plt.imshow(h,cmap='gray')
plt.title("H Channel")
plt.subplot(142)
plt.imshow(s,cmap='gray')
plt.title("S Channel")
plt.subplot(143)
plt.imshow(v,cmap='gray')
plt.title("V Channel")
plt.subplot(144)
plt.imshow(img_NZ_rgb)
plt.title("Original")
Text(0.5, 1.0, 'Original')
h_new = h+10
img_NZ_merged = cv2.merge((h_new,s,v))
img_NZ_rgb = cv2.cvtColor(img_NZ_merged, cv2.COLOR_HSV2RGB)
# Show the channels
plt.figure(figsize=[20,5])
plt.subplot(141)
plt.imshow(h,cmap='gray') # write h_new instead of h for better comaprison
plt.title("H Channel")
plt.subplot(142)
plt.imshow(s,cmap='gray')
plt.title("S Channel");
plt.subplot(143)
plt.imshow(v,cmap='gray')
plt.title("V Channel")
plt.subplot(144)
plt.imshow(img_NZ_rgb)
plt.title("Modified")
Text(0.5, 1.0, 'Modified')
Saving the image is as trivial as reading an image in OpenCV. We use the function cv2.imwrite() with two arguments. The first one is the filename, second argument is the image object.
The function imwrite saves the image to the specified file. The image format is chosen based on the filename extension (see cv::imread for the list of extensions). In general, only 8-bit single-channel or 3-channel (with 'BGR' channel order) images can be saved using this function (see the OpenCV documentation for further details).
cv2.imwrite( filename, img[, params] )
The function has 2 required arguments:
filename: This can be an absolute or relative path. img: Image or Images to be saved.Imwrite: https://docs.opencv.org/4.5.1/d4/da8/group__imgcodecs.html#gabbc7ef1aa2edfaa87772f1202d67e0ce
ImwriteFlags:https://docs.opencv.org/4.5.1/d8/d6a/group__imgcodecs__flags.html#ga292d81be8d76901bff7988d18d2b42ac
# save the image
cv2.imwrite("New_Zealand_Lake_SAVED.png",img_NZ_bgr)
ima(filename='New_Zealand_Lake_SAVED.png')
# read the image as Color
img_NZ_bgr = cv2.imread("New_Zealand_Lake_SAVED.png", cv2.IMREAD_COLOR)
print("img_NZ_bgr shape is: ", img_NZ_bgr.shape)
# read the image as Grayscaled
img_NZ_gry = cv2.imread("New_Zealand_Lake_SAVED.png", cv2.IMREAD_GRAYSCALE)
print("img_NZ_gry shape is: ", img_NZ_gry.shape)
img_NZ_bgr shape is: (600, 840, 3) img_NZ_gry shape is: (600, 840)
Let us see how to access a pixel in the image.
For accessing any pixel in a numpy matrix, you have to use matrix notation such as matrix[r,c], where the r is the row number and c is the column number. Also note that the matrix is 0-indexed.
For example, if you want to access the first pixel, you need to specify matrix[0,0]. Let us see with some examples. We will print one black pixel from top-left and one white pixel from top-center.
# Read image as gray scale.
cb_img = cv2.imread("checkerboard_18x18.png",0)
# Set color map to gray scale for proper rendering.
plt.imshow(cb_img, cmap='gray')
print(cb_img)
[[ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0]]
# print the first pixel of the first black box
print(cb_img[0,0])
# print the first white pixel to the right of the first black box
print(cb_img[0,6])
0 255
We can modify the intensity values of pixels in the same manner as described above.
cb_img_copy = cb_img.copy()
cb_img_copy[2,2] = 200
cb_img_copy[2,3] = 200
cb_img_copy[3,2] = 200
cb_img_copy[3,3] = 200
# Same as above
# cb_img_copy[2:3,2:3] = 200
plt.imshow(cb_img_copy,cmap='gray')
print(cb_img_copy)
[[ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 200 200 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 200 200 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [255 255 255 255 255 255 0 0 0 0 0 0 255 255 255 255 255 255] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0] [ 0 0 0 0 0 0 255 255 255 255 255 255 0 0 0 0 0 0]]
Cropping an image is simply achieved by selecting a specific (pixel) region of the image.
img_NZ_bgr = cv2.imread("New_Zealand_Boat.jpg",cv2.IMREAD_COLOR)
img_NZ_rgb = img_NZ_bgr[:,:,::-1]
plt.imshow(img_NZ_rgb)
<matplotlib.image.AxesImage at 0x28304690bb0>
# crop out the middle region of the image
cropped_region = img_NZ_rgb[200:400,300:600]
plt.imshow(cropped_region)
<matplotlib.image.AxesImage at 0x28304702880>
The function resize resizes the image src down to or up to the specified size. The size and type are derived from the src,dsize,fx, and fy.
dst = resize( src, dsize[, dst[, fx[, fy[, interpolation]]]] )
dst: output image; it has the size dsize (when it is non-zero) or the size computed from src.size(), fx, and fy; the type of dst is the same as of src.
The function has 2 required arguments:
src: input imagedsize: output image sizeOptional arguments that are often used include:
fx: Scale factor along the horizontal axis; when it equals 0, it is computed as (𝚍𝚘𝚞𝚋𝚕𝚎)𝚍𝚜𝚒𝚣𝚎.𝚠𝚒𝚍𝚝𝚑/𝚜𝚛𝚌.𝚌𝚘𝚕𝚜fy: Scale factor along the vertical axis; when it equals 0, it is computed as (𝚍𝚘𝚞𝚋𝚕𝚎)𝚍𝚜𝚒𝚣𝚎.𝚑𝚎𝚒𝚐𝚑𝚝/𝚜𝚛𝚌.𝚛𝚘𝚠𝚜The output image has the size dsize (when it is non-zero) or the size computed from src.size(), fx, and fy; the type of dst is the same as of src.
resized_cropped_region_2x = cv2.resize(cropped_region,None,fx=2,fy=2)
plt.imshow(resized_cropped_region_2x)
<matplotlib.image.AxesImage at 0x28304765be0>
desired_width = 100
desired_height = 200
dim = (desired_width,desired_height)
# Resize background image to see size as logo image
resized_cropped_region = cv2.resize(cropped_region,dsize=dim,interpolation=cv2.INTER_AREA)
plt.imshow(resized_cropped_region)
<matplotlib.image.AxesImage at 0x283047c2670>
# Method 2: Using 'dsize'
desired_width = 100
aspect_ratio = desired_width/cropped_region.shape[1]
desired_height = int(cropped_region.shape[0] * aspect_ratio)
dim = (desired_width,desired_height)
# Resize image
resize_cropped_region = cv2.resize(cropped_region,dsize=dim,interpolation=cv2.INTER_AREA)
plt.imshow(resize_cropped_region)
<matplotlib.image.AxesImage at 0x28304b99130>
# swap the channel order
resized_cropped_region_2x = resized_cropped_region_2x[:,:,::-1]
# save and resized image to disk
cv2.imwrite("resized_cropped_region_2x.png", resized_cropped_region_2x)
# Display the cropped and resized image
ima(filename='resized_cropped_region_2x.png')
import matplotlib.pylab as pylab
im = Image.open('IMG_4781.JPG')
im.show() # display the image
# using nearest neighbour interpolation
imu = im.resize((im.width*3,im.height*3),Image.NEAREST)
pylab.figure(figsize=(10,10)), pylab.imshow(imu), pylab.show()
(<Figure size 720x720 with 1 Axes>, <matplotlib.image.AxesImage at 0x283040a3eb0>, None)
# up-sample with bi-linear interpolation
im1 = im.resize((im.width*5, im.height*5), Image.BILINEAR)
pylab.figure(figsize=(20,20)), pylab.imshow(im1), pylab.show()
(<Figure size 1440x1440 with 1 Axes>, <matplotlib.image.AxesImage at 0x28304307520>, None)
im = im.resize((im.width//5,im.height//5))
pylab.figure(figsize=(15,10)), pylab.imshow(im), pylab.show()
(<Figure size 1080x720 with 1 Axes>, <matplotlib.image.AxesImage at 0x283042b2d00>, None)
The function flip flips the array in one of three different ways (row and column indices are 0-based):
dst = cv.flip( src, flipCode )
dst: output array of the same size and type as src.
The function has 2 required arguments:
src: input imageflipCode: a flag to specify how to flip the array; 0 means flipping around the x-axis and positive value (for example, 1) means flipping around y-axis. Negative value (for example, -1) means flipping around both axes.flip: https://docs.opencv.org/4.5.0/d2/de8/group__core__array.html#gaca7be533e3dac7feb70fc60635adf441
img_NZ_rgb_flipped_horz = cv2.flip(img_NZ_rgb,1)
img_NZ_rgb_flipped_vert = cv2.flip(img_NZ_rgb,0)
img_NZ_rgb_flipped_both = cv2.flip(img_NZ_rgb,-1)
# Show the images
plt.figure(figsize=[18,5])
plt.subplot(141)
plt.imshow(img_NZ_rgb_flipped_horz)
plt.title("Horizontal Flip")
plt.subplot(142)
plt.imshow(img_NZ_rgb_flipped_vert)
plt.title("Vertical Flip")
plt.subplot(143)
plt.imshow(img_NZ_rgb_flipped_both)
plt.title("Both Flipped")
plt.subplot(144)
plt.imshow(img_NZ_rgb)
plt.title("Original")
Text(0.5, 1.0, 'Original')
In this topic, we will learn about image annotation using OpenCV. We will learn how to peform the following annotations to images.
These are useful when you want to annotate your results for presentations or show a demo of your application. Annotations can also be useful during development and debugging.
Let's start off by drawing a line on an image. We will use cv2.line function for this.
img = cv2.line(img, pt1, pt2, color[, thickness[, lineType[, shift]]])
img: The output image that has been annotated.
The function has 4 required arguments:
img: Image on which we will draw a linept1: First point(x,y location) of the line segmentpt2: Second point of the line segmentcolor: Color of the line which will be drawnOther optional arguments that are important for us to know include:
thickness: Integer specifying the line thickness. Default value is 1.lineType: Type of the line. Default value is 8 which stands for an 8-connected line. Usually, cv2.LINE_AA (antialiased or smooth line) is used for the lineType.line:https://docs.opencv.org/4.5.1/d6/d6e/group__imgproc__draw.html#ga7078a9fae8c7e7d13d24dac2520ae4a2
Let's see an example of this.
# Read in an image
image = cv2.imread("Apollo_11_Launch.jpg", cv2.IMREAD_COLOR)
# Display the original image
plt.imshow(image[:,:,::-1])
<matplotlib.image.AxesImage at 0x283075bb760>
imageLine = image.copy()
# The line starts from (200,100) and ends at (400,100)
# The color of the line is YELLOW (Recall that OpenCV uses BGR format)
# Thickness of line is 5px
# Linetype is cv2.LINE_AA
cv2.line(imageLine,(200,100),(400,100),(0,255,255),thickness=5,lineType=cv2.LINE_AA)
# Display the image
plt.imshow(imageLine[:,:,::-1])
<matplotlib.image.AxesImage at 0x2830761fee0>
Let's start off by drawing a circle on an image. We will use cv2.circle function for this.
img = cv2.circle(img, center, radius, color[, thickness[, lineType[, shift]]])
img: The output image that has been annotated.
The function has 4 required arguments:
img: Image on which we will draw a linecenter: Center of the circleradius: Radius of the circlecolor: Color of the circle which will be drawnNext, let's check out the (optional) arguments which we are going to use quite extensively.
thickness: Thickness of the circle outline (if positive).
If a negative value is supplied for this argument, it will result in a filled circle.lineType: Type of the circle boundary. This is exact same as lineType argument in cv2.linecircle: https://docs.opencv.org/4.5.1/d6/d6e/group__imgproc__draw.html#gaf10604b069374903dbd0f0488cb43670
Let's see an example of this.
# Draw a circle
imageCircle = image.copy()
cv2.circle(imageCircle,(900,500),100,(0,0,255),thickness=5,lineType=cv2.LINE_AA)
# Display the image
plt.imshow(imageCircle[:,:,::-1])
<matplotlib.image.AxesImage at 0x2830465d8e0>
We will use cv2.rectangle function to draw a rectangle on an image. The function syntax is as follows.
img = cv2.rectangle(img, pt1, pt2, color[, thickness[, lineType[, shift]]])
img: The output image that has been annotated.
The function has 4 required arguments:
img: Image on which the rectangle is to be drawn.pt1: Vertex of the rectangle. Usually we use the top-left vertex here.pt2: Vertex of the rectangle opposite to pt1. Usually we use the bottom-right vertex here.color: Rectangle colorNext, let's check out the (optional) arguments which we are going to use quite extensively.
thickness: Thickness of the circle outline (if positive).
If a negative value is supplied for this argument, it will result in a filled rectangle.lineType: Type of the circle boundary. This is exact same as lineType argument in
cv2.line
### OpenCV Documentation Linksrectangle:https://docs.opencv.org/4.5.1/d6/d6e/group__imgproc__draw.html#ga07d2f74cadcf8e305e810ce8eed13bc9
Let's see an example of this.
# Draw a rectangle (thickness is a positive integer)
imageRectangle = image.copy()
cv2.rectangle(imageRectangle,(500,100),(700,600),(255,0,255),thickness=5,lineType=cv2.LINE_8)
# Display the image
plt.imshow(imageRectangle[:,:,::-1])
<matplotlib.image.AxesImage at 0x28312ac8820>
Finally, let's see how we can write some text on an image using cv2.putText function.
img = cv2.putText(img, text, org, fontFace, fontScale, color[, thickness[, lineType[, bottomLeftOrigin]]])
img: The output image that has been annotated.
The function has 6 required arguments:
img: Image on which the text has to be written.text: Text string to be written.org: Bottom-left corner of the text string in the image.fontFace: Font typefontScale: Font scale factor that is multiplied by the font-specific base size.color: Font colorOther optional arguments that are important for us to know include:
thickness: Integer specifying the line thickness for the text. Default value is 1.lineType: Type of the line. Default value is 8 which stands for an 8-connected line. Usually, cv2.LINE_AA (antialiased or smooth line) is used for the lineType.putText:https://docs.opencv.org/4.5.1/d6/d6e/group__imgproc__draw.html#ga5126f47f883d730f633d74f07456c576
Let's see an example of this.
imageText = image.copy()
text = "Apollo 11 Saturn V Launch, July 16, 1969"
fontScale = 2.3
fontFace = cv2.FONT_HERSHEY_PLAIN
fontColor = (0,255,0)
fontThickness = 2
cv2.putText(imageText,text,(200,700),fontFace,fontScale,fontColor,fontThickness,cv2.LINE_AA)
# Display the image
plt.imshow(imageText[:,:,::-1])
<matplotlib.image.AxesImage at 0x283076834c0>
Image Processing techniques take advantage of mathematical operations to achieve different results. Most often we arrive at an enhanced version of the image using some basic operations. We will take a look at some of the fundamental operations often used in computer vision pipelines. In this notebook we will cover:
The first operation we discuss is simple addition of images. This results in increasing or decreasing the brightness of the image since we are eventually increasing or decreasing the intensity values of each pixel by the same amount. So, this will result in a global increase/decrease in brightness.
img_bgr = cv2.imread("New_Zealand_Coast.jpg",cv2.IMREAD_COLOR)
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
# Display 18x18 pixel image.
ima(filename='New_Zealand_Coast.jpg')
matrix = np.ones(img_rgb.shape,dtype='uint8')*50
img_rgb_brighter = cv2.add(img_rgb,matrix)
img_rgb_darker = cv2.subtract(img_rgb,matrix)
# Show the image
plt.figure(figsize=[18,5])
plt.subplot(131)
plt.imshow(img_rgb_darker)
plt.title("Darker")
plt.subplot(132)
plt.imshow(img_rgb)
plt.title("Original")
plt.subplot(133)
plt.imshow(img_rgb_brighter)
plt.title("Brighter")
Text(0.5, 1.0, 'Brighter')
Just like addition can result in brightness change, multiplication can be used to improve the contrast of the image.
Contrast is the difference in the intensity values of the pixels of an image. Multiplying the intensity values with a constant can make the difference larger or smaller ( if multiplying factor is < 1 ).
matrix1 = np.ones(img_rgb.shape) * .8
matrix2 = np.ones(img_rgb.shape) * 1.2
img_rgb_darker = np.uint8(cv2.multiply(np.float64(img_rgb),matrix1))
img_rgb_brighter = np.uint8(cv2.multiply(np.float64(img_rgb),matrix2))
# Show the images
plt.figure(figsize=[18,5])
plt.subplot(131)
plt.imshow(img_rgb_darker)
plt.title("Lower Contrast")
plt.subplot(132)
plt.imshow(img_rgb)
plt.title("Original")
plt.subplot(133)
plt.imshow(img_rgb_brighter)
plt.title("Higher Contrast")
Text(0.5, 1.0, 'Higher Contrast')
Can you see the weird colors in some areas of the image after multiplication?
The issue is that after multiplying, the values which are already high, are becoming greater than 255. Thus, the overflow issue. How do we overcome this?
matrix1 = np.ones(img_rgb.shape) * .8
matrix2 = np.ones(img_rgb.shape) * 1.2
img_rgb_darker = np.uint8(cv2.multiply(np.float64(img_rgb),matrix1))
img_rgb_brighter = np.uint8(np.clip(cv2.multiply(np.float64(img_rgb),matrix2),0,255))
# Show the images
plt.figure(figsize=[18,5])
plt.subplot(131)
plt.imshow(img_rgb_darker)
plt.title("Lower Contrast")
plt.subplot(132)
plt.imshow(img_rgb)
plt.title("Original")
plt.subplot(133)
plt.imshow(img_rgb_brighter)
plt.title("Higher Contrast")
Text(0.5, 1.0, 'Higher Contrast')
Binary Images have a lot of use cases in Image Processing. One of the most common use cases is that of creating masks. Image Masks allow us to process on specific parts of an image keeping the other parts intact. Image Thresholding is used to create Binary Images from grayscale images. You can use different thresholds to create different binary images from the same original image.
retval, dst = cv2.threshold( src, thresh, maxval, type[, dst] )
dst: The output array of the same size and type and the same number of channels as src.
The function has 4 required arguments:
src: input array (multiple-channel, 8-bit or 32-bit floating point).thresh: threshold value.maxval: maximum value to use with the THRESH_BINARY and THRESH_BINARY_INV thresholding types.type: thresholding type (see ThresholdTypes).dst = cv.adaptiveThreshold( src, maxValue, adaptiveMethod, thresholdType, blockSize, C[, dst] )
dst Destination image of the same size and the same type as src.
The function has 6 required arguments:
src: Source 8-bit single-channel image.
maxValue: Non-zero value assigned to the pixels for which the condition is satisfied
adaptiveMethod: Adaptive thresholding algorithm to use, see AdaptiveThresholdTypes. The BORDER_REPLICATE | BORDER_ISOLATED is used to process boundaries.thresholdType: Thresholding type that must be either THRESH_BINARY or THRESH_BINARY_INV, see ThresholdTypes.blockSize: Size of a pixel neighborhood that is used to calculate a threshold value for the pixel: 3, 5, 7, and so on.C: Constant subtracted from the mean or weighted mean (see the details below). Normally, it is positive but may be zero or negative as well.https://docs.opencv.org/4.5.1/d7/d1b/group__imgproc__misc.html#gae8a4a146d1ca78c626a53577199e9c57 https://docs.opencv.org/4.5.1/d7/d4d/tutorial_py_thresholding.html
img_read = cv2.imread("building-windows.jpg", cv2.IMREAD_GRAYSCALE)
retval, img_thresh = cv2.threshold(img_read,100,255,cv2.THRESH_BINARY)
# Show the images
plt.figure(figsize=[18,5])
plt.subplot(121); plt.imshow(img_read, cmap="gray"); plt.title("Original");
plt.subplot(122); plt.imshow(img_thresh, cmap="gray"); plt.title("Thresholded");
print(img_thresh.shape)
(572, 800)
Suppose you wanted to build an application that could read (decode) sheet music. This is similar to Optical Character Recognigition (OCR) for text documents where the goal is to recognize text characters. In either application, one of the first steps in the processing pipeline is to isolate the important information in the image of a document (separating it from the background). This task can be accomplished with thresholding techniques. Let's take a look at an example.
# Read the original image
img_read = cv2.imread("Piano_Sheet_Music.png", cv2.IMREAD_GRAYSCALE)
# Perform global thresholding
retval, img_thresh_gbl_1 = cv2.threshold(img_read,50,255,cv2.THRESH_BINARY)
# Perform global thresholding
retval, img_thresh_gbl_2 = cv2.threshold(img_read,130,255,cv2.THRESH_BINARY)
# Perform adaptive thresholding
img_thresh_adp = cv2.adaptiveThreshold(img_read,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,11,7)
# Show the images
plt.figure(figsize=[18,15])
plt.subplot(221)
plt.imshow(img_read,cmap="gray")
plt.title("Original")
plt.subplot(222)
plt.imshow(img_thresh_gbl_1,cmap="gray")
plt.title("Thresholded (global: 50)")
plt.subplot(223)
plt.imshow(img_thresh_gbl_2,cmap="gray")
plt.title("Thresholded (global: 130)")
plt.subplot(224)
plt.imshow(img_thresh_adp,cmap="gray")
plt.title("Thresholded (adaptive)")
Text(0.5, 1.0, 'Thresholded (adaptive)')
Example API for cv2.bitwise_and(). Others include: cv2.bitwise_or(), cv2.bitwise_xor(), cv2.bitwise_not()
dst = cv2.bitwise_and( src1, src2[, dst[, mask]] )
dst: Output array that has the same size and type as the input arrays.
The function has 2 required arguments:
src1: first input array or a scalar.src2: second input array or a scalar.An important optional argument is:
mask: optional operation mask, 8-bit single channel array, that specifies elements of the output array to be changed.https://docs.opencv.org/4.5.1/d0/d86/tutorial_py_image_arithmetics.html https://docs.opencv.org/4.5.0/d2/de8/group__core__array.html#ga60b4d04b251ba5eb1392c34425497e14
img_rec = cv2.imread("rectangle.jpg", cv2.IMREAD_GRAYSCALE)
img_cir = cv2.imread("circle.jpg", cv2.IMREAD_GRAYSCALE)
plt.figure(figsize=[20,5])
plt.subplot(121);plt.imshow(img_rec,cmap='gray')
plt.subplot(122);plt.imshow(img_cir,cmap='gray')
print(img_rec.shape)
(200, 499)
result = cv2.bitwise_and(img_rec,img_cir,mask=None)
plt.imshow(result,cmap='gray')
<matplotlib.image.AxesImage at 0x283cb1db3d0>
result = cv2.bitwise_or(img_rec,img_cir,mask=None)
plt.imshow(result,cmap='gray')
<matplotlib.image.AxesImage at 0x283cadff1c0>
result = cv2.bitwise_xor(img_rec,img_cir,mask=None)
plt.imshow(result,cmap='gray')
<matplotlib.image.AxesImage at 0x283cae2f610>
In this section we will show you how to fill in the white lettering of the Coca-Cola logo below with a background image.
ima(filename='Logo_Manipulation.png')
img_bgr = cv2.imread("coca-cola-logo.png")
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
plt.imshow(img_rgb)
print(img_rgb.shape)
logo_w = img_rgb.shape[0]
logo_h = img_rgb.shape[1]
(700, 700, 3)
# Read in image of color cheackerboad background
img_background_bgr = cv2.imread("checkerboard_color.png")
img_background_rgb = cv2.cvtColor(img_background_bgr, cv2.COLOR_BGR2RGB)
# Set desired width (logo_w) and maintain image aspect ratio
aspect_ratio = logo_w / img_background_rgb.shape[1]
dim = (logo_w,int(img_background_rgb.shape[0]*aspect_ratio))
# Resize background image to sae size as logo image
img_background_rgb = cv2.resize(img_background_rgb,dim,interpolation=cv2.INTER_AREA)
plt.imshow(img_background_rgb)
print(img_background_rgb.shape)
(700, 700, 3)
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2GRAY)
# Apply global thresholding to creat a binary mask of the logo
retval, img_mask = cv2.threshold(img_gray,127,255,cv2.THRESH_BINARY)
plt.imshow(img_mask,cmap="gray")
print(img_mask.shape)
(700, 700)
# Create an inverse mask
img_mask_inv = cv2.bitwise_not(img_mask)
plt.imshow(img_mask_inv,cmap="gray")
<matplotlib.image.AxesImage at 0x283cb75d730>
# Create colorful background "behind" the logo lettering
img_background = cv2.bitwise_and(img_background_rgb, img_background_rgb, mask=img_mask)
plt.imshow(img_background)
<matplotlib.image.AxesImage at 0x283cb7c95b0>
# Create colorful background "behind" the logo lettering
img_foreground = cv2.bitwise_and(img_rgb, img_rgb, mask=img_mask_inv)
plt.imshow(img_foreground)
<matplotlib.image.AxesImage at 0x283cbe95430>
# Add the two previous results obtain the final result
result = cv2.add(img_background,img_foreground)
plt.imshow(result)
cv2.imwrite("logo_final.png", result[:,:,::-1])
True